Bio
         
        
        
    I’m the Carnegie Bosch Professor of Business Technologies and Marketing and Associate Dean for Research at Carnegie Mellon University’s Tepper School of Business. My research focuses on how AI and algorithmic systems are transforming markets, shaping consumer trust, and influencing platform strategy, pricing, and fairness. I’m passionate about helping organizations and policymakers understand and navigate the responsible deployment of emerging technologies.
  
  
    I lead the Collaborative AI Initiative at CMU—an effort to reimagine business education through generative AI. We’re building adaptive, interactive learning environments that mirror real-world complexity and better prepare students for the decisions they’ll face in modern organizations.
  
  
    My work has been featured in the Wall Street Journal, Financial Times, CNN, Forbes, Bloomberg, and cited in the U.S. President’s Economic Report to Congress. I’ve received multiple honors, including the INFORMS Information Systems Society Distinguished Fellow Award, the Don Lehmann Award, the John DC Little Award, and the Don Morrison Long-Term Impact Award (Finalist).
  
  
    I’ve served as Director of the PNC Center for Financial Services Innovation, leading a $5.5M renewal of its research program. I also serve as Senior Editor at Information Systems Research and Associate Editor at Management Science. I’ve had the privilege of mentoring several Ph.D. students who’ve gone on to faculty roles at Harvard, NYU, Michigan, and other top institutions.
  
  I always welcome thoughtful conversations—whether you're a student, researcher, policymaker, or industry leader working on the future of AI and innovation. Feel free to reach out.
        Curriculum Vitae (Updated
          May 2025)
        
        
        Email: psidhu@cmu.edu
          Tel: +1 (412) 268-3585
          Address:
          David A. Tepper School of Business
          Tepper Quad 5137, Carnegie Mellon University
          Pittsburgh, PA 15213
          U.S.A.
        Teaching
          45882: Digital Marketing and Social Media Strategy (MBA)
          47952: Estimating Dynamic and Structural Models (PhD)
          47954: Generative AI: Economic and Social Aspects (PhD)
        
        PhD Students 
        Current PhD Students
          
          Liying Qiu
          
        
        Past PhD Students (bold=Chair/Co-Chair
          dissertation committee; First placement)
        Qiaochu
              Wang (New York University) 
          Runshan Fu (New
          York University)
          Nikhil
              Malik (University of Southern California)
          Shunyuan
              Zhang (Harvard Business School)
          Elina
              Hwang (University of Washington)
          Yan
              Huang (University of Michigan)
          Yingda
              Lu (Rensselaer Polytechnic Institute)
          Xiao Liu
          (New York University)
          Vilma
            Todri (Emory University)
        Prospective PhD Students
          (i) My research merges Economics and Computer Science. A genuine
          interest in both fields is vital. (ii) We prioritize the rigor of the
          courses you've taken and your performance in them over general GPA.
          It's crucial to highlight challenging classes in your application,
          especially those like stochastic processes and real analysis that
          demand strong logical and formal proofs. (iii) When applying, select
          'Business Technology' and 'Marketing' as your top two choices (in any
          order) to ensure consideration in both areas.
        
        Publications
        
        (with Qiaochu Wang and Yan Huang)
        Marketing Science, forthcoming
          
        Abstract
              (click to expand):
            Machine learning algorithms are increasingly used to evaluate borrower creditworthiness in financial lending, yet many lenders do not provide pre-approval tools that could significantly benefit consumers. These tools are essential for reducing consumer uncertainty and improving financial decision-making. This paper examines why symmetric lenders, with equal non-price features and algorithmic accuracy, might asymmetrically reveal pre-approval outcomes. Using a multi-stage game theory model, we analyze the strategic decisions of duopoly lenders in offering pre-approval tools for unsecured financial products. Our findings suggest that high algorithm accuracy can sustain an asymmetric revelation equilibrium, with one lender revealing pre-approval outcomes through pre-approval tools while the other does not, even when there is no explicit cost of providing such pre-approval tools. Conversely, low algorithm accuracy prompts both lenders to reveal pre-approval outcomes. These findings diverge from traditional literature, which typically associates asymmetric revelation with differentiated products or revealing cost. Additionally, our results show that mandatory revelation policies could reduce lenders' incentives to improve algorithmic accuracy, potentially harming social welfare. These insights inform managerial strategies on the use of algorithmic transparency in lending and underscore the need for careful consideration of regulatory policies to balance market efficiency and consumer protection. 
                
        (with Liying Qiu, Yan Huang and Kannan
          Srinivasan)
        Marketing Science, forthcoming
          
        Abstract
              (click to expand):
            Our study investigates the impact of product ranking systems on artificial intelligence (AI) powered pricing algorithms. Specifically, we examine the effects of ``personalized'' and ``unpersonalized'' ranking systems on algorithmic pricing outcomes and consumer welfare. Our analysis reveals that personalized ranking systems, which rank products in decreasing order of consumer's utilities, may encourage higher prices charged by pricing algorithms, especially when consumers search for products sequentially on a third-party platform. This is because personalized ranking significantly reduces the ranking-mediated price elasticity of demand and thus incentives to lower prices. Conversely, unpersonalized ranking systems  lead to significantly lower prices and greater consumer welfare. These findings suggest that even in the absence of price discrimination, personalization may not necessarily benefit consumers since pricing algorithms can undermine consumer welfare through higher prices. Thus, our study highlights the crucial role of ranking systems in shaping algorithmic pricing behaviors and consumer welfare. 
        Online
            Appendix
            
        (with Runshan Fu, Yan Huang, Nitin Mehta and
          Kannan Srinivasan)
        Marketing Science, forthcoming
          
        Abstract
              (click to expand):
            We study the impact of Zillow’s Zestimate on housing market outcomes
            and how the impact differs across socio-economic segments. Zestimate
            is produced by a Machine Learning algorithm using large amounts of
            data and aims to predict a home’s market value at any time.
            Zestimate can potentially help market participants in the housing
            market as identifying the value of a home is a non-trivial task.
            However, inaccurate Zestimate could also lead to incorrect beliefs
            about property values and therefore suboptimal decisions, which
            would hinder the selling process. Meanwhile, Zestimate tends to be
            systematically more accurate for rich neighborhoods than poor
            neighborhoods, raising concerns that the benefits of Zestimate may
            accrue largely to the rich, which could widen socio-economic
            inequality. Using data on Zestimate and housing sales in the United
            States, we show that Zestimate overall benefits the housing market,
            as on average it increases both buyer surplus and seller profit.
            This is primarily because its uncertainty reduction effect allows
            sellers to be more patient and set higher reservation prices to wait
            for buyers who truly value the properties, which improves
            seller-buyer match quality. Moreover, Zestimate actually reduces
            socio-economic inequality, as our results reveal that both rich and
            poor neighborhoods benefit from Zestimate but the poor neighborhoods
            benefit more. This is because poor neighborhoods face greater prior
            uncertainty and therefore would benefit more from new signals. 
        
        
        (with Nikhil Malik and Kannan Srinivasan)
        Information Systems Research, 35(4),
          2024, 1524-1545. 
        Abstract
              (click to expand):
            We compare the career outcomes of MBA graduates with attractive and
            plain-looking faces. Our findings reveal that attractive MBA
            graduates have a significantly higher probability (52.4%) of holding
            more desirable jobs compared to their plain-looking counterparts 15
            years after obtaining their MBA degree, resulting in a 15-year
            attractiveness premium of 2.4%. This premium corresponds to an
            annual salary differential of $2,508. Additionally, we observed an
            "extreme" attractiveness premium of over 11% for the top 10% most
            attractive graduates, leading to a yearly salary differential of
            $5,528. Notably, this attractiveness premium remains consistent over
            time. Moreover, the attractiveness premium is more pronounced among
            arts undergraduate graduates, those in managerial roles or the
            management industry, as opposed to those with IT backgrounds or
            working in technical jobs or the IT industry post MBA. To achieve
            these results, we devised a robust methodological framework that
            combines custom Machine Learning models. These models generate a
            time series of an individual's attractiveness through morphing a
            single profile picture and determine career success by ranking job
            titles based on revealed preferences in job transitions.
            Additionally, we employed a quasi-experiment design using propensity
            score matching to ensure the accuracy and reliability of our
            analysis.
        
        
        (with Qiaochu Wang, Yan Huang and Stefanus
          Jasin)
        Management Science, 69(4), 2023,
          2297-2317. 
          
        AIS Senior Scholar's Best Paper Award 2024,
            Winner
        Abstract
              (click to expand):
            Should firms that apply machine learning algorithms in their
            decision making make their algorithms transparent to the users they
            affect? Despite the growing calls for algorithmic transparency, most
            firms keep their algorithms opaque, citing potential gaming by users
            that may negatively affect the algorithm’s predictive power. In this
            paper, we develop an analytical model to compare firm and user
            surplus with and without algorithmic transparency in the presence of
            strategic users and present novel insights. We identify a broad set
            of conditions under which making the algorithm transparent actually
            benefits the firm. We show that, in some cases, even the predictive
            power of the algorithm can increase if the firm makes the algorithm
            transparent. By contrast, users may not always be better off under
            algorithmic transparency. These results hold even when the
            predictive power of the opaque algorithm comes largely from
            correlational features and the cost for users to improve them is
            minimal. We show that these insights are robust under several
            extensions of the main model. Overall, our results show that firms
            should not always view manipulation by users as bad. Rather, they
            should use algorithmic transparency as a lever to motivate users to
            invest in more desirable features.
        Online
            Appendix
        
        
        (with Nikhil Malik, Manmohan Aseri and Kannan
          Srinivasan)
        Management Science, 68(10), 2022,
          7065-7791.
        Abstract
              (click to expand):
            Bitcoin falls dramatically short of the scale provided by banks for
            payments. Currently, its ledger grows by the addition of blocks of
            $\sim$ 2000 transaction every 10 minutes. Intuitively, one would
            expect that increasing the block capacity would solve this scaling
            problem. However, we show that increasing the block capacity would
            be futile. We analyze strategic interactions of miners, who are
            heterogeneous in their power over block addition, and users, who are
            heterogeneous in the value of their transactions, using a
            game-theoretic model. We show that a capacity increase can
            facilitate large miners to tacitly collude – artificially reversing
            back the capacity via strategically adding partially filled blocks
            in order to extract economic rents. This strategic partial filling
            crowds out low value payments. Collusion is sustained if the
            smallest colluding miner has a share of block addition power above a
            lower bound. We provide empirical evidence of such strategic partial
            filling of blocks by large miners of Bitcoin. We show that a
            protocol design intervention can breach the lower bound and
            eliminate collusion. However, this also makes the system less
            secure. On the one hand, collusion crowds out low-value payments; on
            the other hand, if collusion is suppressed, security threatens
            high-value payments. As a result, it is untenable to include range
            of payments with vastly different outside options, willingness to
            bear security risk and delay onto a single chain. Thus, we show
            economic limits to the scalability of Bitcoin. Under these economic
            limits, collusive rent extraction acts as an effective mechanism to
            invest in platform security and build responsiveness to demand
            shocks. These traits are otherwise hard to attain in
            dis-intermediated setting owing to the high cost of consensus. 
        Online
            Appendix
        
        (with Runshan Fu, Manmohan Aseri and Kannan
          Srinivasan)
        Management Science, 68(6), 2022,
          4173-4195. 
        Best Paper in Management Science 2019-2022, Information Systems, Finalist
        Abstract
              (click to expand):
            Ensuring fairness in algorithmic decision making is a crucial policy issue. Current legislation ensures fairness by barring algorithm designers from using demographic information in their decision making. As a result, to be legally compliant, the algorithms need to ensure equal treatment. However, in many cases, ensuring equal treatment leads to disparate impact particularly when there are differences among groups based on demographic classes. In response, several “fair” machine learning (ML) algorithms that require impact parity (e.g., equal opportunity) at the cost of equal treatment have recently been proposed to adjust for the societal inequalities. Advocates of fair ML propose changing the law to allow the use of protected class-specific decision rules. We show that the proposed fair ML algorithms that require impact parity, while conceptually appealing, can make everyone worse off, including the very class they aim to protect. Compared with the current law, which requires treatment parity, the fair ML algorithms, which require impact parity, limit the benefits of a more accurate algorithm for a firm. As a result, profit maximizing firms could underinvest in learning, that is, improving the accuracy of their machine learning algorithms. We show that the investment in learning decreases when misclassification is costly, which is exactly the case when greater accuracy is otherwise desired. Our paper highlights the importance of considering strategic behavior of stake holders when developing and evaluating fair ML algorithms. Overall, our results indicate that fair ML algorithms that require impact parity, if turned into law, may not be able to deliver some of the anticipated benefits. 
        Online
            Appendix
        
        (with Shunyuan Zhang, Dokyun Lee and Tridas
          Mukhopadhyay)
        Journal of Marketing Research, 59
          (2), 2022, 374-391.
          Don Lehmann Award 2024, Winner 
        Abstract
              (click to expand):
            We examine whether and how ride-sharing services influence the
            demand for home-sharing services. Our identification strategy hinges
            on a natural experiment where Uber/Lyft exited Austin in May 2016 in
            response to new regulations. On a 12-month longitudinal data
            spanning 13,737 Airbnb properties, we find Uber/Lyft’s exit led to a
            decrease of 18.0% in Airbnb demand in Austin. On the supply side,
            the nightly rate went down by 3.9% and the supplied listings
            decreased by 6.8%. Further, the geographic demand dispersion of
            Airbnb decreased and became more concentrated in areas with access
            to better public transportation. The absence of Uber/Lyft reduced
            demand more for lower-end properties—whose customers may be more
            price-sensitive. Further analysis leveraging individual hotel data
            reveals an increase in Austin hotels’ occupancy in the absence of
            Uber/Lyft, with a greater increase for hotels that are more
            substitutable to Airbnb. These results indicate ease of access to
            transportation in residential areas is critical for the success of
            home-sharing services. Any policies that negatively affect
            ride-sharing services would also negatively affect demand for
            home-sharing services. 
        Online
            Appendix 
        
        (with Shunyuan Zhang, Kannan Srinivasan and
          Nitin Mehta)
        Harvard Business Review, September
          17, 2021.
        Abstract
              (click to expand):
            While companies may spend a lot of time testing models before
            launch, many spend too little time considering how they will work in
            wild. In particular, they fail to fully consider how rates of
            adoption can warp developers’ intent. For instance, Airbnb launched
            a pricing algorithm to close the earnings gap between Black and
            white hosts. While the algorithm reduced economic disparity among
            adopters by 71.3%, Black hosts were 41% less likely to use it, and
            so in many cases it made the earnings gap wider. The company needed
            to better consider how the algorithm would be perceived, and address
            that in its rollout to encourage its target audience, Black hosts,
            to trust it. This presents two lessons for companies: consider how
            an algorithmic tool will be perceived and create a targeted plan to
            build trust. 
        
        (with Shunyuan Zhang, Dokyun Lee and Kannan
          Srinivasan)
        Management Science, 68(8), 2021,
          5644-5666. 
        Management Science Best Paper Award in Marketing, 
            finalist
        Abstract
              (click to expand):
            We study how Airbnb property demand changed after the acquisition of
            verified images (taken by Airbnb’s photographers) and explore what
            makes a good image for an Airbnb property. Using deep learning and
            difference-in-difference analyses on an Airbnb panel data set
            spanning 7,423 properties over 16 months, we find that properties
            with verified images had 8.98% higher occupancy than properties
            without verified images (images taken by the host). To explore what
            constitutes a good image for an Airbnb property, we quantify 12
            human-interpretable image attributes that pertain to three artistic
            aspects—composition, color, and the figure-ground relationship—and
            we find systematic differences between the verified and unverified
            images. We also predict the relationship between each of the 12
            attributes and property demand, and we find that most of the
            correlations are significant and in the theorized direction. Our
            results provide actionable insights for both Airbnb photographers
            and amateur host photographers who wish to optimize their images.
            Our findings contribute to and bridge the literature on photography
            and marketing (e.g., staging), which often either ignores the demand
            side (photography) or does not systematically characterize the
            images (marketing).
        Online
            Appendix 
        
        (with Shunyuan Zhang, Nitin Mehta and Kannan
          Srinivasan)
        Frontiers at Marketing Science,
          40(5), 2021, 813-820. 
          John DC Little Award, Finalist
        Abstract
              (click to expand):
            We study the effect of Airbnb’s smart-pricing algorithm on the
            racial disparity in the daily revenue earned by Airbnb hosts. Our
            empirical strategy exploits Airbnb’s introduction of the algorithm
            and its voluntary adoption by hosts as a quasi-natural experiment.
            Among those who adopted the algorithm, the average nightly rate
            decreased by 5.7%, but average daily revenue increased by 8.6%.
            Before Airbnb introduced the algorithm, white hosts earned $12.16
            more in daily revenue than Black hosts, controlling for observed
            characteristics of the hosts, properties, and locations. Conditional
            on its adoption, the revenue gap between white and Black hosts
            decreased by 71.3%. However, Black hosts were significantly less
            likely than white hosts to adopt the algorithm, so at the population
            level, the revenue gap increased after the introduction of the
            algorithm. We show that the algorithm’s price recommendations are
            not affected by the host’s race—but we argue that the algorithm’s
            race-blindness may lead to pricing that is sub- optimal, and more so
            for Black hosts than for white hosts. We also show that the
            algorithm’s effectiveness at mitigating the Airbnb revenue gap is
            limited by the low rate of algorithm adoption among Black hosts. We
            offer recommendations with which policy makers and Airbnb may
            advance smart-pricing algorithms in mitigating racial economic
            disparities. 
        Online
            Appendix 
        
        (with Runshan Fu and Yan Huang)
        Information Systems Research, 32(1),
          2021, 72-92. 
          Best Paper in Information Systems Research 2021, Finalist
        Abstract
              (click to expand):
            Big data and machine learning (ML) algorithms are key drivers of
            many fintech innovations. While it may be obvious that replacing
            humans with machines would increase efficiency, it is not clear
            whether and where machines can make better decisions than humans. We
            answer this question in the context of crowd lending, where
            decisions are traditionally made by a crowd of investors. Using data
            from Prosper.com, we show that a reasonably sophisticated ML
            algorithm predicts listing default probability more accurately than
            crowd investors. The dominance of the machine over the crowd is more
            pronounced for highly risky listings. We then use the machine to
            make investment decisions, and find that the machine benefits not
            only the lenders but also the borrowers. When machine prediction is
            used to select loans, it leads to a higher rate of return for
            investors and more funding opportunities for borrowers with few
            alternative funding options. We also find suggestive evidence that
            the machine is biased in gender and race even when it does not use
            gender and race information as input. We propose a general and
            effective “debiasing” method that can be applied to any prediction
            focused ML applications, and demonstrate its use in our context. We
            show that the debiased ML algorithm, which suffers from lower
            prediction accuracy, still leads to better investment decisions
            compared with the crowd. These results indicate that ML can help
            crowd lending platforms better fulfill the promise of providing
            access to financial resources to otherwise underserved individuals
            and ensure fairness in the allocation of these resources. 
        Online
            Appendix 
        
        (with Runshan Fu and Yan Huang)
        Tutorials in Operations Research,
          2020.
        Abstract
              (click to expand):
            Artificial intelligence (AI) and machine learning (ML) algorithms
            are widely used throughout our economy in making decisions that have
            far-reaching impacts on employment, education, access to credit, and
            other areas. Initially considered neutral and fair, ML algorithms
            have recently been found increasingly biased, creating and
            perpetuating structural inequalities in society. With the rising
            concerns about algorithmic bias, a growing body of literature
            attempts to understand and resolve the issue of algorithmic bias. In
            this tutorial, we discuss five important aspects of algorithmic
            bias. We start with its definition and the notions of fairness
            policy makers, practitioners, and academic researchers have used and
            proposed. Next, we note the challenges in identifying and detecting
            algorithmic bias given the observed decision outcome, and we
            describe methods for bias detection. We then explain the potential
            sources of algorithmic bias and review several bias-correction
            methods. Finally, we discuss how agents’ strategic behavior may lead
            to biased societal outcomes, even when the algorithm itself is
            unbiased. We conclude by discussing open questions and future
            research directions. 
        
        (with Vilma Todri and Anindya Ghose)
        Information Systems Research, 31(1),
          2020, 102-125. 
          Best Paper in Information Systems Research 2020, Finalist
        Abstract
              (click to expand):
            Digital advertisers often harness technology-enabled
            advertising-scheduling strategies, such as ad repetition at the
            individual consumer level, in order to improve advertising
            effectiveness. However, such strategies might elicit annoyance in
            consumers, as indicated by anecdotal evidence such as the popularity
            of ad-blocking technologies. Our study captures this trade-off
            between effective and annoying display advertising. We propose a
            Hidden Markov Model that allows us to investigate both the enduring
            impact of display advertising on consumers' purchase decisions and
            the potential of persistent display advertising to stimulate
            annoyance in consumers. Additionally, we study the structural
            dynamics of these advertising effects by allowing them to be
            contingent on the latent state of the funnel path in which each
            consumer resides. Our findings demonstrate that a tension exists
            between generating interest and triggering annoyance in consumers;
            whereas display advertising has an enduring impact on transitioning
            consumers further down the purchase funnel, persistent
            display-advertising exposures beyond a frequency threshold can have
            an adverse effect by increasing the chances that consumers will be
            annoyed. Investigating the dynamics of these annoyance effects, we
            reveal that consumers who reside in different stages of the purchase
            funnel exhibit considerably different tolerance for annoyance
            stimulation. Our findings also reveal that the format of display
            advertisements, the level of diversification of ad creatives as well
            as consumer demographics moderate consumers' thresholds for
            annoyance elicitation. For instance, advertisers can reduce
            annoyance elicitation as a result of frequent display advertising
            exposures when they employ static -rather than animated- display ads
            as well as when they diversify the display ad creatives shown to
            consumers. Our paper contributes to the literature on digital
            advertising and consumer annoyance and has significant managerial
            implications for the online advertising ecosystem. 
        
        (with Nikhil Malik)
        Tutorials in Operations Research,
          2019
        Abstract
              (click to expand):
            Deep learning models have succeeded at a variety of human
            intelligence tasks and are already being used at commercial scale.
            These models largely rely on the standard gradient descent
            optimization of parameters W, which maps an input X to an output y
            ̂=f(X;W). The optimization procedure minimizes the loss (difference)
            between the model output y ̂ and actual output y. As an example, in
            the cancer detection setting, X is an MRI image, while y is the
            presence or absence of cancer. Three key ingredients hint at the
            reason behind deep learning’s power. (1) Deep architectures better
            adapt to breaking down complex functions into a composition of
            simpler abstract parts. (2) Standard gradient descent methods that
            attain local minima on a nonconvex Loss(y,y ̂) function that are
            close enough to the global minima. (3) Architectures suited for
            execution on parallel computing hardware (e.g., GPUs), thus making
            the optimization viable over hundreds of millions of observations
            (X,y). Computer vision tasks, where input X is a high-dimensional
            image or video, are particularly suited to deep learning
            application. Recent advances in deep architectures, i.e., inception
            modules, attention networks, adversarial networks and DeepRL, have
            opened up completely new applications that were previously
            unexplored. However, the breakneck progress to replace human tasks
            with deep learning comes with caveats. These deep models tend to
            evade interpretation, lack causal relationships between input X and
            output y and may inadvertently mimic not just human actions but
            human biases and stereotypes. In this tutorial, we provide an
            intuitive explanation of deep learning methods in computer vision as
            well as limitations in practice. 
        
        (with Elina Hwang and Linda Argote)
        Information Systems Research, 30(2),
          2019, 389-410.
        INFORMS TIMES 2024 Best Paper in
            Management Science, Finalist
        Abstract
              (click to expand):
            This study investigates how the information that individuals
            accumulate through helping others in a customer support
            crowdsourcing community influences their ability to generate novel,
            popular, and feasible ideas in an innovation crowdsourcing
            community. A customer support crowdsourcing community is one in
            which customers help each other develop solutions to their current
            problems with a company’s products; an innovation crowdsourcing
            community is one in which customers propose new product ideas
            directly to a company. Because a customer support community provides
            information regarding customers’ current needs and provides
            opportunities to help individuals activate relevant means
            information, we expect that individuals’ experience of helping in a
            customer support community enhances their new product ideation
            performance. By utilizing a natural language processing technique,
            we construct each individual’s information network based on his or
            her helping activities in a customer support community. Building on
            analogical reasoning theory, we hypothesize that the patterns of
            individuals’ information networks, in terms of breadth and depth,
            influence their various new product ideation outcomes in an
            innovation crowdsourcing community. Our analysis reveals that
            generalists, who have offered help on broad topic domains in the
            customer support community, are more likely to create novel ideas
            than are non-generalists. Further, we find that generalists who have
            accumulated deep knowledge in at least one topic domain (deep
            generalists) outperform non-generalists in their ability to generate
            popular and feasible ideas, whereas generalists who have accumulated
            only shallow knowledge across diverse domain areas (shallow
            generalists) do not. The results suggest that the ability of
            generalists to outperform non-generalists in creating popular and
            feasible ideas is contingent on whether they have also accumulated
            deep knowledge. 
        
        (with Shunyuan Zhang and Anindya Ghose)
        Information Systems Research, 30(1),
          2019, 15-33.
        Abstract
              (click to expand):
            We investigate the long-term impact of competing against superstars
            in crowdsourcing contests. Using a unique 50-month longitudinal
            panel data set on 1677 software design crowdsourcing contests, we
            illustrate a learning effect where participants are able to improve
            their skills (learn) more when competing against a superstar than
            otherwise. We show that an individual’s probability of winning in
            subsequent contests increases significantly after she has
            participated in a contest with a superstar coder than otherwise. We
            build a dynamic structural model with individual heterogeneity where
            individuals choose contests to participate in and where learning in
            a contest happens through an information theory-based Bayesian
            learning framework. We find that individuals with lower ability to
            learn tend to value monetary reward highly, and vice versa. The
            results indicate that individuals who greatly prefer monetary reward
            tend to win fewer contests, as they rarely achieve the high skills
            needed to win a contest. Counterfactual analysis suggests that
            instead of avoiding superstars, individuals should be encouraged to
            participate in contests with superstars early on, as it can
            significantly push them up the learning curve, leading to higher
            quality and a higher number of submissions per contest. Overall, our
            study shows that individuals who are willing to forego short-term
            monetary rewards by participating in contests with superstars have
            much to gain in the long term. 
        
        (with Quan Wang and Beibei Li)
        Information Systems Research 29(2),
          2018, 273-291.
          Best Paper in Information Systems Research 2018, Finalist
        Abstract
              (click to expand):
            While the growth of the mobile apps market has created significant
            market opportunities and economic incentives for mobile app
            developers to innovate, it has also inevitably invited otherc
            developers to create rip-offs. Practitioners and developers of
            original apps claim that copycats steal the original app’s idea and
            potential demand, and have called for app platforms to take action
            against such copycats. Surprisingly, however, there has been little
            rigorous research analyzing whether and how copycats affect an
            original app’s demand. The primary deterrent to such research is the
            lack of an objective way to identify whether an app is a copycat or
            an original. Using a combination of machine learning techniques such
            as natural language processing, latent semantic analysis,
            network-based clustering and image analysis, we propose a method to
            identify apps as original or copycat and detect two types of
            copycats: deceptive and non-deceptive. Based on the detection
            results, we conduct an econometric analysis to determine the impact
            of copycat apps on the demand for the original apps on a sample of
            10,100 action game apps by 5,141 developers that were released in
            the iOS App Store over five years. Our results indicate that the
            effect of a specific copycat on an original app’s demand is
            determined by the quality and level of deceptiveness of the copycat.
            High-quality, non-deceptive copycats negatively affect demand for
            the originals. In contrast, low-quality, deceptive copycats
            positively affect demand for the originals. Results indicate that in
            aggregate the impact of copycats on the demand of original mobile
            app is statistically insignificant. Our study contributes to the
            growing literature on mobile app consumption by presenting a method
            to identify copycats and providing evidence of the impact of
            copycats on an original app’s demand. 
        
        (with Yingda Lu and Baohong Sun)
        Management Information Systems Quarterly,
          41(2), 2017, 607-628.
        Abstract
              (click to expand):
            Many companies have adopted technology driven social learning
            platforms such as social CRM (crowdsourcing customer support from
            customers) to support knowledge sharing among customers. A number of
            these self-evolving online customer support communities have
            reported the emergence of a core-periphery knowledge sharing network
            structure. In this study, we investigate why such a structure
            emerges and its implications for knowledge sharing within the
            community. We propose a dynamic structural model with endogenized
            knowledge-sharing and network formation. Our model recognizes the
            dynamic and interdependent nature of knowledge-seeking-and-sharing
            decisions and allows them to be driven by knowledge increments and
            social status building in anticipation of future reciprocal rewards
            from peers. Applying this model to a fine grained panel data set
            from a social customer support forum for a telecom firm, we
            illustrate that a user in this community values being connected to
            other well connected individuals. As a result, a user is more
            inclined to answer questions of those who are in the core (well
            connected) than the ones who are in the periphery (not well
            connected). We find that users are taking into account the expected
            likelihood of their questions receiving a solution before asking a
            question. With the emergence of core-periphery network structure,
            the peripheral individuals are discouraged from asking questions as
            their expectation of receiving a solution to their question is very
            low. Thus, the core-periphery structure has created a barrier to
            knowledge flow to new customers who need the knowledge the most. Our
            counterfactuals show that hiding the identity of the knowledge
            seeker or making the individual contributions obsolete faster helps
            break the core-periphery structure and improves knowledge sharing in
            the community. 
        
        (with Xiao Liu and Kannan Srinivasan)
        Marketing Science, 35(3), 2016,
          363-388.
          Don Morrison Long Term Impact Award in Marketing 2023, Finalist
        Abstract
              (click to expand):
            Accurate forecasting of sales/consumption is particularly important
            for marketing because this information can be used to adjust
            marketing budget allocations and overall marketing strategies.
            Recently, online social platforms have produced an unparalleled
            amount of data on consumer behavior. However, two challenges have
            limited the use of these data in obtaining meaningful business
            marketing insights. First, the data are typically in an unstructured
            format, such as texts, images, audio, and video. Second, the sheer
            volume of the data makes standard analysis procedures
            computationally unworkable. In this study, we combine methods from
            cloud computing, machine learning, and text mining to illustrate how
            online platform content, such as Twitter, can be effectively used
            for forecasting. We conduct our analysis on a significant volume of
            nearly two billion Tweets and 400 billion Wikipedia pages. Our main
            findings emphasize that, by contrast to basic surface-level measures
            such as the volume of or sentiments in Tweets, the information
            content of Tweets and their timeliness significantly improve
            forecasting accuracy. Our method endogenously summarizes the
            information in Tweets. The advantage of our method is that the
            classification of the Tweets is based on what is in the Tweets
            rather than preconceived topics that may not be relevant. We also
            find that, by contrast to Twitter, other online data (e.g., Google
            Trends, Wikipedia views, IMDB reviews, and Huffington Post news) are
            very weak predictors of TV show demand because users tweet about TV
            shows before, during, and after a TV show, whereas Google searches,
            Wikipedia views, IMDB reviews, and news posts typically lag behind
            the show. 
        
        (with Ray Reagans and Ramayya Krishnan)
        Organization Science, 26(5), 2015,
          1400-1414.
        Abstract
              (click to expand):
            Third parties play a prominent role in network-based explanations
            for successful knowledge transfer. Third parties can be either
            shared or unshared. Shared third parties signal insider status and
            have a predictable positive effect on knowledge transfer. Unshared
            third parties, however, signal outsider status and are believed to
            undermine knowledge transfer. Surprisingly, unshared third parties
            have been ignored in empirical analysis, and so we do not know if or
            how much unshared third parties contribute to the process. Using
            knowledge transfer data from an online technical forum, we
            illustrate how unshared third parties affect the rate at which
            individuals initiate and sustain knowledge transfer relationships.
            Empirical results indicate that unshared third parties undermine
            knowledge sharing, and they also indicate that the magnitude of the
            negative unshared-third-party effect declines the more unshared
            third parties overlap in what they know. Our results provide a more
            complete view of how third parties contribute to knowledge sharing.
            The results also advance our understanding of network-based dynamics
            defined more broadly. By documenting how knowledge overlap among
            unshared third parties moderates their negative influence, our
            results show when the benefits provided by third parties and by
            bridges (i.e., relationships with outsiders) will be opposed versus
            when both can be enjoyed. 
        
        (with Elina Hwang and Linda Argote)
        Organization Science, 26(6), 2015,
          1593-1611.
        Abstract
              (click to expand):
            Many organizations have launched online knowledge-exchanging
            communities to promote knowledge sharing among their employees. We
            empirically examine the dynamics of knowledge sharing in an
            organization-hosted knowledge forum. Although previous researchers
            have suggested that geographic and social boundaries disappear
            online, we hypothesize that they remain because participants prefer
            to share knowledge with others who share similar attributes, due to
            the challenges involved in knowledge sharing in an online community.
            Further, we propose that as participants acquire experience in
            exchanging knowledge, they learn to rely more on expertise
            similarity and less on categorical similarities, such as location or
            hierarchical status. As a result, boundaries based on categorical
            attributes are expected to weaken, and boundaries based on expertise
            are expected to strengthen, as participants gain experience in the
            online community. Empirical support for this argument is obtained
            from analyzing a longitudinal dataset of an internal online
            knowledge community at a large multinational IT consulting firm. 
        
        (with Yan Huang and Anindya Ghose)
        Management Science, 61(12), 2015,
          2825-2844.
        Abstract
              (click to expand):
            We develop and estimate a dynamic structural framework to analyze
            social-media content creation and consumption behavior of employees
            within an enterprise. We focus, in particular, on employees'
            blogging behavior. The model incorporates two key features that are
            ubiquitous in blogging forums: Users face 1) a trade-off between
            blog posting and blog reading; and 2) a trade-off between
            work-related and leisure-related content. We apply the model to a
            unique dataset that comprises the complete details of blog posting
            and reading behavior of 2,396 employees over a 15-month period at a
            Fortune 1000 IT services and consulting firm. We find evidence of
            strong competition among employees with regard to attracting
            readership for their posts. We also find that the utility employees
            derive from work-related blogging is 4.4 times what they derive from
            leisure-related blogging. However, employees still post a
            significant amount of leisure posts. This is because there is a
            significant spillover effect on the readership of work posts from
            the creation of leisure posts. In addition, we find that reading and
            writing work-related posts is more costly than reading and writing
            leisure-related posts, on average. We conduct counterfactual
            experiments that provide insights into how different policies may
            affect employee behavior. We find that a policy of prohibiting
            leisure-related activities can hurt the knowledge sharing in an
            enterprise setting. By demonstrating that there are positive
            spillovers from leisure-related blogging to work-related blogging,
            our results suggest that a policy of abolishing leisure-related
            content creation can inadvertently have adverse consequences on
            work-related content creation in an enterprise setting. 
        
        (with Liye Ma, Alan Montgomery and Michael
          Smith)
        Information Systems Research, 25(3),
          2014, 590-603.
        Abstract
              (click to expand):
            Digital distribution channels raise many new challenges for managers
            in the media industries. This is particularly true for movie studios
            where high-value content can be stolen and released through
            illegitimate digital channels, even prior to the release of the
            movie in legal channels. In response to this potential threat, movie
            studios have spent millions of dollars to protect their content from
            unauthorized distribution throughout the lifecycle of films. They
            have focused their efforts on the pre-release period under the
            assumption that pre-release piracy could be particularly harmful for
            a movie’s success. However, surprisingly, there has been little
            rigorous research to analyze whether, and how much, pre-release
            movie piracy diminishes legitimate sales. In this paper, we analyze
            this question using data collected from a unique Internet
            file-sharing site. We find that, on average, pre-release piracy
            causes a 19.1% decrease in revenue compared to piracy that occurs
            post- release. Our study contributes to the growing literature on
            piracy and digital media consumption by presenting evidence of the
            impact of Internet-based movie piracy on sales, and by analyzing
            pre-release piracy, a setting that is distinct from much of the
            extant literature. 
        
        (with Yan Huang and Kannan Srinivasan)
        Management Science, 60(9), 2014,
          2138-2159. 
          INFORMS TIMES 2019 Best Paper in Management Science, Finalist
            Best Paper in Management Science 2013-2016, Information Systems, Finalist
          
        Abstract
              (click to expand):
            We propose a dynamic structural model that illuminates the economic
            mechanisms shaping individual behavior and outcomes on crowdsourced
            ideation platforms. We estimate the model using a rich data set
            obtained from IdeaStorm.com, a crowdsourced ideation initiative
            affiliated with Dell. We find that, on IdeaStorm.com, individuals
            tend to significantly underestimate the costs to the firm for
            implementing their ideas but overestimate the potential of their
            ideas in the initial stages of the crowdsourcing process. Therefore,
            the “idea market” is initially overcrowded with ideas that are less
            likely to be implemented. However, individuals learn about both
            their abilities to come up with high-potential ideas as well as the
            cost structure of the firm from peer voting on their ideas and the
            firm’s response to contributed ideas. We find that individuals learn
            rather quickly about their abilities to come up with high-potential
            ideas, but the learning regarding the firm’s cost structure is quite
            slow. Contributors of low-potential ideas eventually become
            inactive, whereas the high-potential idea contributors remain
            active. As a result, over time, the average potential of generated
            ideas increases while the number of ideas contributed decreases.
            Hence, the decrease in the number of ideas generated represents
            market efficiency through self-selection rather than its failure.
            Through counterfactuals, we show that providing more precise cost
            signals to individuals can accelerate the filtering process.
            Increasing the total number of ideas to respond to and improving the
            response speed will lead to more idea contributions. However,
            failure to distinguish between high- and low-potential ideas and
            between high- and low-ability idea generators leads to the overall
            potential of the ideas generated to drop significantly. 
        Online
            Appendix 
        
        (with Nachiketa Sahoo and Tridas
          Mukhopadhyay)
        Information Systems Research, 25(1),
          2014, 35-52.
        Abstract
              (click to expand):
            We investigate the dynamics of blog reading behavior of employees in
            an enterprise blogosphere. A dynamic model is developed and
            calibrated using longitudinal data from a Fortune 1000 IT services
            firm. We identify a variety-seeking behavior of blog readers where
            they frequently switch from reading on one set of topics to another
            dynamically. Our results indicate that this switching behavior is
            induced by the textual characteristics (sentiment and quality) of
            the posts read, reader characteristics (status, location,
            expertise), or a readers' inherent desire for variety. Our modeling
            framework allows us to segregate the impact of post-textual
            characteristics on attracting readers from retaining them. We find
            that the textual characteristics that appeal to the sentiment of the
            reader affect both reader attraction and retention. However, textual
            characteristics that reflect only the quality of the posts affect
            only reader retention. The modeling framework and findings of this
            study highlight opportunities for a firm to influence blog reading
            behavior of its employees to align it with its goals. We provide
            directions to improve the utility of blogs as a medium for knowledge
            sharing. Overall, the blog reading dynamics estimation of this study
            contributes to the development of theoretically grounded
            understanding of reading behavior of individuals in online settings
            and more specifically in communities formed around user generated
            content. 
        
        (with Yingda Lu and Kinshuk Jerath)
        Management Science, 59(8), 2013,
          1783-1799.
        Abstract
              (click to expand):
            We study the drivers of the emergence of opinion leaders in a
            networked community where users share information with each other.
            Our specific setting is that of Epinions.com, a website dedicated to
            user-generated product reviews. Epinions.com employs a novel
            mechanism in which every member of the community can include other
            members, whose reviews she trusts, in her “web of trust.” This leads
            to the formation of a network of trust among reviewers with high
            in-degree individuals being the opinion leaders. Accordingly, we
            study the emergence of opinion leaders in this community using a
            network formation paradigm. We model network growth by using a
            dyad-level proportional hazard model with time-varying covariates.
            To estimate this model, we use Weighted Exogenous Sampling with
            Bayesian Inference (WESBI), a methodology that we develop for fast
            estimation of dyadic models on large network datasets. We find that,
            in the Epinions network, both the widely-studied “preferential
            attachment” effect based on the existing number of inlinks (i.e., a
            network-based property of a node) and the number and quality of
            reviews written (i.e., an intrinsic property of a node) are
            significant drivers of new incoming trust links to a reviewer (i.e.,
            inlinks to a node). Interestingly, time is an important moderator of
            these effects — the number of recent reviews written has a stronger
            effect than the effect of the number of recent inlinks received on
            the current rate of attracting inlinks; however, the aggregate
            number of reviews written in the past has no effect, while the
            aggregate number of inlinks obtained in the past has a significant
            effect on the current rate of attracting inlinks. This leads to the
            novel and important implication that, in a network growth setting,
            intrinsic node characteristics are a stronger short-term driver of
            additional inlinks, while the preferential attachment effect has a
            smaller impact but it persists for a longer time. We discuss the
            managerial implications of our results for the design and
            organization of online review communities. 
        Online
            Appendix 
        
        (with Corey Phelps)
        Information Systems Research, 24(3),
          2013, 539-560.
        Abstract
              (click to expand):
            Existing research provides little insight into how social influence
            affects the adoption and diffusion of compet-ing innovative
            artifacts and how the experiences of organizational members who have
            worked with particular innovations in their previous employers
            affect their current organizations’ adoption decision. We adapt and
            extend the heterogeneous diffusion model from sociology and examine
            the conditions under which prior adopters of competing OSS licenses
            socially influence how a new OSS project chooses among such licenses
            and how the experiences of the project manager of a new OSS project
            with particular licenses affects its sus-ceptibility to this social
            influence. We test our predictions using a sample of 5,307 open
            source projects hosted at SourceForge. Our results suggest the most
            important factor determining a new project’s license choice is the
            type of license chosen by existing projects that are socially closer
            to it in its inter-project social network. More-over, we find that
            prior adopters of a particular license are more infectious in their
            influence on the license choice of a new project as their size and
            performance rankings increase. We also find that managers of new
            projects who have been members of more successful prior OSS projects
            and who have greater depth and di-versity of experience in the OSS
            community are less susceptible to social influence. Finally, we find
            a project manager is more likely to adopt a particular license type
            when his or her project occupies a similar social role as other
            projects that have adopted the same license. These results have
            implications for research on innovation adoption and diffusion, open
            source software licensing, and the governance of economic exchange.
          
        :
          We present a hidden Markov model for collaborative filtering of
          implicit ratings when the ratings have been generated by a set of
          changing user preferences. Most of the works in the collaborative
          filtering and recommender systems literature have been developed under
          the assumption that user preference is a static pattern. However, we
          show by analyzing a dataset on employees’ blog reading behaviors that
          users’ reading behaviors do change over time. We model the unobserved
          user preference as a Hidden Markov sequence. The observation that
          users read variable numbers of blog articles in each time period and
          choose different types of articles to read, requires a novel
          observation model. We use a Negative Binomial mixture of Multinomials
          to model such observations. This allows us to identify stable global
          preferences of users towards the items in the dataset and allows us to
          track the users through these preferences. We compare the algorithm
          with a number of static algorithms and a recently proposed dynamic
          collaborative filtering algorithm and find that the proposed HMM based
          collaborative filter outperforms the other algorithms. 
        
        (with Rohit Aggarwal, Ram Gopal and Ramesh
          Sankaranarayanan)
        Information Systems Research, 23(2),
          2012, 305-322.
        Abstract
              (click to expand)
        
        (with Nachiketa Sahoo and Tridas
          Mukhopadhyay)
        Management Information Systems Quarterly,
          35(4), 2011, 813-829.
        Abstract
              (click to expand)
        
        :
            Blogs have recently received a lot of attention, especially in the
            business community, with a number of firms encouraging their
            employees to publish blogs to reach out and connect to a wider
            audience. The business world is beginning to realize that employee
            blogs can cast a firm in either a positive or a negative light,
            thereby enhancing or harming the firm’s reputation. However, we find
            that negative posts by employees draw a higher readership, which has
            the potential to actually help the overall reputation of the firm.
            The explanation for this is that readers perceive an employee
            blogger to be honest and helpful when they read negative posts on
            the blog, and recommend the blog more to their friends, who will
            then also be exposed to the positive posts on the blog. First, we
            present a theoretical discussion, explaining why blogs containing
            negative posts could draw a larger audience. Next, we present
            empirical evidence that blogs that contain negative posts do draw a
            larger readership, and we derive a relationship between the extent
            of negative posts and readership. Our empirical model accounts for
            inherent non-linearities, serial correlation, issues of endogeneity
            and unobserved heterogeneity, and potential alternative
            specifications. Our results suggest that ceteris paribus, negative
            posts increase the readership of an employee blog asymptotically.
            Furthermore, we use the derived model to suggest conditions under
            which negative posts on an employee blog can lead to a greater
            overall positive influence on readers towards the employee’s firm.
            We illustrate the application of the framework using a unique
            blogging data from employees at a Fortune 500 company. 
        
        (with Yong Tan and Vijay Mookerjee)
        Management Information Systems Quarterly,
          35(4), 2011, 813-829.
        Abstract
              (click to expand):
            What determines open source project success? In this study, we
            investigate the impact of network social capital - the benefits open
            source developers secure from their memberships in a developer
            collaboration network - on open source project success. We focus on
            one specific type of success as measured by the productivity of open
            source project team. Specific hypotheses are developed and tested on
            a longitudinal panel of 2378 projects hosted at Sourceforge. We find
            that network social capital is not equally accessible to or
            appropriated by all projects. Our main results are (1) teams with
            greater internal cohesion are more successful, (2) external cohesion
            (cohesion among the external contacts of a team) has an inverse
            U-shaped relationship with the project's success; moderate levels of
            external cohesion are the best for a project's success, rather than
            very low or very high levels of this variable, (3) the technological
            diversity of a contact also has the greatest benefit when it is
            neither too low nor too high, and (4) the number of direct and
            indirect external contacts are positively correlated with a
            project's success with the effect of the number of direct contacts
            being moderated by the number of indirect contacts. These results
            are robust to a number of control variables and alternate model
            specifications. Several theoretical and managerial implications are
            provided. 
        
        (with Tridas Mukhopadhyay and Seung Hyun Kim)
        Information Systems Research, 22(3),
          2011, 586-605. 
          Best Paper in Information Systems Research 2011, Finalist
        Abstract
              (click to expand):
            To improve operational efficiencies while providing state of the art
            healthcare services, hospitals rely on IT enabled physician referral
            systems (IT-PRS). This study examines learning curves in an IT-PRS
            setting to determine whether agents achieve performance improvements
            from cumulative experience at different rates and how information
            technologies transform the learning dynamics in this setting. We
            present a hierarchical Bayes model that accounts for different agent
            skills (domain and system), and estimate learning rates for three
            types of referral requests: emergency (EM), non-emergency (NE), and
            non-emergency out of network (NO). Further, the model accounts for
            complementarities among the three referral request types and the
            impact of system upgrade on learning rates. We estimate this model
            using data from more than 80,000 referral requests to a large
            IT-PRS. We find that (1) The IT-PRS exhibits a learning rate of 4.5%
            for EM referrals, 7.2% for NE referrals, and 12.3% for NO referrals.
            This is slower than the learning rate of manufacturing (on average
            20%) and more comparable to other service settings (on average 8%).
            (2) Domain and system experts are found to exhibit significantly
            different learning behaviors. (3) Significant and varying
            complementarities among the three referral request types are also
            observed. (4) The performance of domain experts is affected more
            adversely in comparison to system experts immediately after system
            upgrade. (5) Finally, the learning rate change subsequent to system
            upgrade is also higher for system experts in comparison to domain
            experts. Overall, system upgrades are found to have a long term
            positive impact on the performance of all agents. The learning curve
            estimation of this study contributes to the development of
            theoretically grounded understanding of learning behaviors of domain
            and system experts in an IT enabled critical healthcare service
            setting. 
        
        (with Yong Tan and Nara Youn)
        Information Systems Research, 22(4),
          2011, 790-807.
        Abstract
              (click to expand):
            This study examines whether developers learn from their experience
            and from interactions with peers in OSS projects. A Hidden Markov
            Model (HMM) is proposed that allows us to investigate (1) the extent
            to which OSS developers actually learn from their own experience and
            from interactions with peers, (2) whether a developer's abilities to
            learn from these activities vary over time, and (3) to what extent
            developer learning persists over time. We calibrate the model on six
            years of detailed data collected from 251 developers working on 25
            OSS projects hosted at Sourceforge. Using the HMM three learning
            states (high, medium, and low) are identified and the marginal
            impact of learning activities on moving the developer between these
            states is estimated. Our findings reveal different patterns of
            learning in different learning states. Learning from peers appears
            as the most important source of learning for developers across the
            three states. Developers in the medium learning state benefit most
            through discussions that they initiate. On the other hand,
            developers in the low and the high states benefit the most by
            participating in discussions started by others. While in the low
            state, developers depend entirely upon their peers to learn whereas
            when in medium or high state they can also draw upon their own
            experiences. Explanations for these varying impacts of learning
            activities on the transitions of developers between the three
            learning states are provided. 
        
        (with Yong Tan)
        Journal of Management Information
            Systems, 27(3), 2011, 179-210.
        Abstract
              (click to expand):
            Over the last few years, open source software (OSS) development has
            gained a huge popularity and has attracted a large variety of
            developers under its fold. According to software engineering
            folklore, the architecture and the organization of software depend
            on the communication patterns of the contributors. Communication
            patterns among developers influence knowledge sharing among them.
            Unlike in a formal organization, the communication network
            structures in an OSS project evolve unrestricted and unplanned. We
            develop a non-cooperative game theoretic model to investigate the
            network formation in an OSS team and to characterize the stable and
            efficient structures. We incorporate developer heterogeneity in the
            network based on their informative value. We find that, for a given
            scenario, there may exist several stable structures which are
            inefficient. We also find that there may not always exist a stable
            structure that is efficient. This can be explained by the fact that
            the stability of the structure is dependent on the developer's
            maximization of self utility whereas the efficiency of the structure
            is dependent on the maximization of group utility. In general, a
            tension exists between the stability and efficiency of structures
            because developers act in their self interest rather than the group
            interest. We find, whenever there is such a tension, the stable
            structure is either under-connected across types or over-connected
            within type of developers from an efficiency perspective.
            Empirically, we use the latent class model and analyze two
            real-world OSS projects hosted at Sourceforge.net. For each project,
            different types of developers and a stable structure is identified,
            which fits well with the predictions of our model. We further
            discuss implications of our results and provide directions for
            future research. 
        
        ACM Transactions of Software Engineering
            and Methodology, 20(2), 2010, 6:1-6:27.
        Abstract
              (click to expand)
        
            
        :
            Are some Open Source Software (OSS) communities more conducive to
            software development than others? In this study, we investigate the
            impact of community level networks (relationships that exist among
            developers in an OSS community) on member developers' productivity.
            Specifically, we argue that OSS community networks, characterized by
            small world properties, would positively influence the productivity
            of the member developers by providing them with speedy and reliable
            access to more quantity and variety of information and knowledge
            resources. Specific hypotheses are developed and tested using
            longitudinal data on a large panel of 4279 projects from 15
            different OSS communities hosted at Sourceforge. Our results suggest
            that there is significant variation in small world properties of OSS
            communities at Sourceforge. After accounting for project, foundry
            and time specific observed and unobserved effects, we found
            statistically significant relationship between small world
            properties of a community and the technical and commercial success
            of the software produced by its members. We also found lack of
            significant relationship between betweenness and closeness
            centralities of the project teams and their success. These results
            were robust to a number of controls and model specifications. 
        
        Working Papers
        
        (with Julie Wang, and Zoey Jiang)
        Abstract
              (click to expand):
            As new data privacy regulations tighten consumer data access, e-commerce platforms face a growing population of anonymous users who choose not to share their personal information. This paper examines how such anonymity reshapes competitive dynamics in digital marketplaces. We develop a two-stage theoretical model where, in the first stage, the platform designs customer segmentation, and in the second stage, both the platform and third-party sellers engage in differential pricing. We introduce the concept of "fuzzy segmentation"-a strategy that combines anonymous users with selectively grouped data-sharing consumers. This approach enables platforms to soften competition and extract higher profits, challenging the classical view that segmentation necessarily intensifies price rivalry. Surprisingly, we find that anonymous consumers, despite opting for privacy, may still face a price premium at equilibrium. Additionally, data-sharing consumers who prefer third-party sellers may also experience negative spillovers, paying higher prices due to their inclusion in mixed segments. Our findings reveal a critical trade-off: while privacy regulations empower consumers to protect their data, they may inadvertently lead to higher prices for both anonymous and certain data-sharing consumers, reducing overall consumer welfare. This study sheds light on the strategic importance of segmentation in platform markets and offers fresh insights into the unintended consequences of data privacy regulations. 
        
        (with Shunyuan Zhang, Nitin Mehta and Kannan
          Srinivasan)
        Abstract
              (click to expand):
            Prior research has shown that high-quality images increase the
            current demand for Airbnb properties. However, many properties do
            not adopt high-quality images even when offered for free by Airbnb.
            Our study provides an answer to this puzzling observation. We
            develop a structural model of demand and supply of Airbnb
            properties, where hosts strategically choose image quality for their
            properties. Using a one-year panel data from 958 properties in
            Manhattan, we find evidence that a host’s decision to use
            high-quality images entails a trade-off: high-quality images may
            attract more guests in the current period, but if the property does
            not live up to the expectations created by the image quality, then
            they risk disappointing guests. The guests would then leave bad no
            reviews at all, which would adversely affect future demand.
            Counterfactual policy simulations show that Airbnb could
            significantly increase its profits (up to 18.9%) by offering
            medium-quality images for free to hosts or providing free access to
            a choice between high-quality and medium-quality images. These
            policies help improve Airbnb's profits since they enable the hosts
            to upgrade their image quality to an extent that aligns with their
            property quality.  
        
        (with Qiaochu Wang, Yan Huang and Kannan
          Srinivasan)
        Abstract
              (click to expand):
            Automated pricing strategies in e-commerce can be broadly
            categorized into two forms - simple rule-based such as undercutting
            the lowest price, and more sophisticated artificial intelligence
            (AI) powered algorithms, such as reinforcement learning (RL)
            algorithms. Although simple rule-based pricing remains the most
            widely used strategy, a few retailers have adopted pricing
            algorithms powered by AI. RL algorithms are particularly appealing
            for pricing due to their abilities to autonomously learn an optimal
            policy and adapt to changes in competitors' pricing strategies and
            market environment. Despite the common belief that RL algorithms
            hold a significant advantage over rule-based strategies, our
            extensive pricing experiments demonstrate that when competing
            against RL pricing algorithms, simple rule-based algorithms may
            result in higher prices and benefit all sellers, compared to
            scenarios where multiple RL algorithms compete against each other.
            To validate our findings, we estimate a non-sequential search
            structural demand model using individual-level data from a large
            e-commerce platform and conduct counterfactual simulations. The
            results show that in a real-world demand environment, simple
            rule-based algorithms outperform RL algorithms when facing other RL
            competitors. Our research sheds new light on the effectiveness of
            automated pricing algorithms and their interactions in competitive
            markets, and provides practical insights for retailers in selecting
            the appropriate pricing strategies. 
            Learning in Human-AI Collaboration
        
        (with Zoey Jiang and Linda Argote)
        Abstract
              (click to expand): Human–artificial intelligence (AI) collaboration is increasingly prevalent in organizational decision making. This study investigates how human decision makers learn during repeated interactions with AI---specifically, machine learning models---and how these learning processes are impacted by AI design, including the transparency and complexity of the model's decision rules. We conceptualize learning in human–AI collaboration as two components: learning from AI to improve independent decision making and learning about AI to improve the weighting of AI advice in collaboration. These components are then integrated to make collaborative decisions. Through a laboratory experiment in which 288 participants make housing price predictions, we find that transparent AI (particularly ones with sparse representations) helps participants learn from AI, improving their future independent decisions. In contrast, black-box or more complex transparent AI facilitates learning about AI by reducing participants' tendency to inappropriately discount AI advice, leading to more effective weighting of AI inputs. Participants' prior decision-making ability moderates these effects, thereby shaping the integrated collaborative decision-making performance. Low-ability participants benefit most from transparent AI with moderate complexity, which balances the two learning components.  High-ability participants perform best with black-box AI---their strong baseline performance reduces the need to learn from AI, allowing these AI designs to focus on mitigating their greater tendency to inappropriately discount AI advice. Notably, this latter finding diverges from the common belief that black-box AI is inherently problematic for users. These findings highlight the importance of aligning AI design with user capabilities, offering actionable insights for AI system designs in organizations. 
        
        
        
        Work in Progress
        Wrong Model or Wrong Practices?
          Mis-specified Demand Model and Algorithmic Bias in Personalized
          Pricing
        
        (with Qiaochu Wang, Yan Huang and Kannan
          Srinivasan)
        Abstract
              (click to expand):The societal significance of fair machine learning (ML) cannot be overstated, yet quantifying algorithmic bias and ensuring fair ML remains a challenging task. One popular fair ML objective, equality of opportunity, requires equal treatment for individuals who are equally deserving, regardless of their group affiliation. However, determining who should be considered ``equally deserving" is a complex and critical aspect that directly affects the estimation of algorithmic bias. This paper emphasizes the importance of accurately measuring equal deservingness in order to accurately estimate algorithmic bias. To illustrate this point, we examine the case of personalized pricing and show that a common misspecification in the model for equal deservingness can lead to incorrect conclusions regarding algorithmic bias. Specifically, using a detailed consumer data set from a large e-commerce platform, we demonstrate that when consumers engage in search behavior before purchasing and there are differences in such behavior based on gender, ignoring the search behavior in the demand specification model can lead one to the incorrect conclusion that personalized pricing is biased against women. Overall, our research highlights the critical role that a proper model specification plays in achieving fair ML practices. 
        Using Machine Learning to Diagnose Dynamic Human Decision Making: Insights from a BioInnovation Context
        
        (with Ziqian Ding and Zoey Jiang)
        Abstract
              (click to expand)
        
        Testing Theoretical Alignment in Language Models: Evidence from Insurance Choices
        
        (with Liying Qiu and Kannan Srinivasan)
        Abstract
              (click to expand): Large language models (LLMs) are increasingly used to simulate human decision-making in structured environments. While their outputs may resemble human behaviors in some cases, it remains unclear whether these outputs reflect consistent application of decision-theoretic principles or arise from alternative regularities. This distinction matters: different cognitive processes can lead to similar choices in one context but diverge under counterfactual or modified conditions. Without knowing what reasoning drives a model’s decisions, it is difficult to evaluate whether its outputs generalize beyond the data it was trained on. We introduce the Theory-Based Chain of Verification (TBCV), a diagnostic framework for testing whether a model's behavior aligns with a specified theory. TBCV operates in two steps: it first infers the decision rule or parameters implied by a model’s initial choices, then evaluates whether the model applies the same rule when presented with strategically modified scenarios. We apply TBCV to the domain of home insurance choices, where consumers select among deductible-premium tradeoffs under risk, a setting extensively studied using prospect theory. While GPT-4-turbo replicates aggregate human behaviors, its individual-level choices often deviate from those predicted by prospect theory and instead reflect patterns such as extremeness aversion. This application illustrates how TBCV can help distinguish between surface-level alignment and theoretical consistency, providing a tool for understanding how predictive systems generate their outputs. The TBCV framework is theory-agnostic and applies broadly to behavioral models and decision systems.
        
              Generative Consumer Search and Structural Estimation
        
        (with Liying Qiu, Nitin Mehta and Kannan Srinivasan)
         Informs Marketing Science Annual Conference 2025 Flash Session on LLMs and Text Mining,
            Winner
           
        
        Recent Interviews
        
          - Powering Hyperscalers in Western Pa - WPXI-TV’s Our Region’s Business features our deep-dive with Bill Flanagan on powering hyperscale data centers in Western Pa.  
- Discussing the Impact of Deepseek - CMU's Where What If Becomes What's Next Podcast featured my thoughts on the impact of deepseek on the gen AI race 
-  Managing AI's Energy Demand and Climate Goals - An Energy Week Panel moderated by me where we discuss the role of electric grid, energy mix, measurement issues with LLM energy consumptions and climate goals. Hosted by Scott Institute for Energy Innovation   
        Selected Opinion Pieces
        
          - Does Algorithmic Pricing Carry a Risk of Price Collusion? - Published in Tepperspectives, May 21, 2025. 
- Solutions to the AI Energy Demand - Published in Tepperspectives, Feb 6, 2025. 
- Can Gen AI Search Overtake Google? - Published in Tepperspectives, Dec 9, 2024. 
- Google’s Cookie Conundrum: Privacy and Ads - Published in Tepperspectives, Sep 17, 2024. 
- Algorithmic Pricing: Understanding the FTC's Case Against Amazon - Published in CMU News, Oct 13, 2023. 
        Personal
        I live in Pittsburgh with my wife Kiran, daughter Elin, son Aidan,
          and one Australian shepherd, Blue Coco.
          Watch Coco catching a frisbee. Kiran is
          a dentist in Fox Chapel
          Pittsburgh.